Multi-Armed Bandits with Metric Movement Costs

نویسندگان

  • Tomer Koren
  • Roi Livni
  • Yishay Mansour
چکیده

We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure C of the underlying metric which depends on its covering numbers. In finite metric spaces with k actions, we give an efficient algorithm that achieves regret of the form Õ(max{C1/3T , √ kT}), and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret Θ̃(max{k1/3T , √ kT}) where C = Θ(k), and (ii) the interval metric with regret Θ̃(max{T , √ kT}) where C = Θ(1). For infinite metrics spaces with Lipschitz loss functions, we derive a tight regret bound of Θ̃(T d+1 d+2 ) where d ≥ 1 is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Budgeted Bandit Problems with Continuous Random Costs

We study the budgeted bandit problem, where each arm is associated with both a reward and a cost. In a budgeted bandit problem, the objective is to design an arm pulling algorithm in order to maximize the total reward before the budget runs out. In this work, we study both multi-armed bandits and linear bandits, and focus on the setting with continuous random costs. We propose an upper confiden...

متن کامل

Multi-armed bandits on implicit metric spaces

The multi-armed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the trade-off between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives (“arms”), and in each round it selects one arm and then observes the corresponding reward. While the case of small number of arms is by now well-understood, a lot of re...

متن کامل

On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs

This paper discusses how to efficiently choose from n unknown distributions the k ones whose means are the greatest by a certain metric, up to a small relative error. We study the topic under two standard settings—multi-armed bandits and hidden bipartite graphs—which differ in the nature of the input distributions. In the former setting, each distribution can be sampled (in the i.i.d. manner) a...

متن کامل

Strategic Exit with Random Observations

In the standard optimal stopping problems, actions are artificially restricted to the moments of observations of costs or benefits. In the standard experimentation and learning models based on two-armed Poisson bandits, it is possible to take an action between two sequential observations. The latter models do not recognize the fact that timing of decisions depends not only on the rate of arriva...

متن کامل

Thompson Sampling for Budgeted Multi-Armed Bandits

Thompson sampling is one of the earliest randomized algorithms for multi-armed bandits (MAB). In this paper, we extend the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by a budget. We start with the case of Bernoulli bandits, in which the random rewards (costs) of an arm are independently sampled from a Bernoulli distribution...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017